-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Integrate Chutes API with Kimi K2.5-TEE model #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add ChutesClient class for Chutes API (https://api.chutes.ai/v1) - Support CHUTES_API_KEY environment variable for authentication - Set moonshotai/Kimi-K2.5-TEE as default model - Enable thinking mode by default with <think>...</think> parsing - Use Kimi K2.5 recommended parameters (temp=1.0, top_p=0.95) - Increase context limit to 256K tokens - Add openai>=1.0.0 dependency for OpenAI-compatible API client
📝 WalkthroughWalkthroughThe pull request introduces multi-provider LLM support by adding a Chutes API client with thinking mode capabilities as the default provider, alongside OpenRouter as a fallback. The implementation includes a factory function for provider selection, updated configuration defaults for the Kimi K2.5-TEE model, and extended cost/token tracking across both providers. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Agent (main)
participant Config as CONFIG
participant Factory as get_llm_client()
participant ChutesC as ChutesClient
participant LiteLLMC as LiteLLMClient
participant API as Chutes/OpenRouter API
Agent->>Config: read provider setting
Config-->>Agent: provider = "chutes" (or fallback)
Agent->>Factory: get_llm_client(provider, model, cost_limit, enable_thinking)
alt provider == "chutes"
Factory->>ChutesC: instantiate with auth, thinking_mode
ChutesC-->>Factory: client ready
else provider == "openrouter"
Factory->>LiteLLMC: instantiate with litellm config
LiteLLMC-->>Factory: client ready
end
Factory-->>Agent: llm_client
Agent->>ChutesC: chat(messages, temperature, max_tokens)
ChutesC->>API: request (with thinking mode params)
API-->>ChutesC: response (thinking + content)
ChutesC->>ChutesC: extract thinking_content
ChutesC-->>Agent: LLMResponse(thinking, cost, usage)
Agent->>Agent: run agent loop with response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Part of Umbrella PR: #6 (Epic: Complete Chutes API Integration) This PR is the first step in the stacked PR sequence:
Please see #6 for the complete merge strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/llm/client.py`:
- Around line 102-164: The ChutesClient currently only reads CHUTES_API_TOKEN so
users setting CHUTES_API_KEY (as documented) will get an auth error; update the
token retrieval in ChutesClient.__init__ to accept either environment variable
(check CHUTES_API_TOKEN first, then CHUTES_API_KEY or vice versa) and set
self._api_token accordingly, and update the raised LLMError message to reference
both env var names; ensure the later OpenAI client initialization still uses
self._api_token.
🧹 Nitpick comments (1)
pyproject.toml (1)
30-30: Consider consolidating dependency declarations.
openai>=1.0.0is declared in bothrequirements.txtandpyproject.tomlwith matching versions. If both files are intentional (e.g., for tool compatibility or development workflows), maintain alignment as part of standard practice.
| class ChutesClient: | ||
| """LLM Client for Chutes API with Kimi K2.5-TEE. | ||
|
|
||
| Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1 | ||
| Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled. | ||
|
|
||
| Environment variable: CHUTES_API_TOKEN | ||
|
|
||
| Kimi K2.5 parameters: | ||
| - Thinking mode: temperature=1.0, top_p=0.95 | ||
| - Instant mode: temperature=0.6, top_p=0.95 | ||
| - Context window: 256K tokens | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| model: str = CHUTES_DEFAULT_MODEL, | ||
| temperature: Optional[float] = None, | ||
| max_tokens: int = 16384, | ||
| cost_limit: Optional[float] = None, | ||
| enable_thinking: bool = True, | ||
| # Legacy params (kept for compatibility) | ||
| cache_extended_retention: bool = True, | ||
| cache_key: Optional[str] = None, | ||
| ): | ||
| self.model = model | ||
| self.max_tokens = max_tokens | ||
| self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0")) | ||
| self.enable_thinking = enable_thinking | ||
|
|
||
| # Set temperature based on thinking mode if not explicitly provided | ||
| if temperature is None: | ||
| params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS | ||
| self.temperature = params["temperature"] | ||
| else: | ||
| self.temperature = temperature | ||
|
|
||
| self._total_cost = 0.0 | ||
| self._total_tokens = 0 | ||
| self._request_count = 0 | ||
| self._input_tokens = 0 | ||
| self._output_tokens = 0 | ||
| self._cached_tokens = 0 | ||
|
|
||
| # Get API token | ||
| self._api_token = os.environ.get("CHUTES_API_TOKEN") | ||
| if not self._api_token: | ||
| raise LLMError( | ||
| "CHUTES_API_TOKEN environment variable not set. " | ||
| "Get your API token at https://chutes.ai", | ||
| code="authentication_error" | ||
| ) | ||
|
|
||
| # Import and configure OpenAI client for Chutes API | ||
| try: | ||
| from openai import OpenAI | ||
| self._client = OpenAI( | ||
| api_key=self._api_token, | ||
| base_url=CHUTES_API_BASE, | ||
| ) | ||
| except ImportError: | ||
| raise ImportError("openai not installed. Run: pip install openai") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support the documented CHUTES_API_KEY env var to prevent auth failures.
The client only checks CHUTES_API_TOKEN. If users follow the documented CHUTES_API_KEY, auth will fail. Accept both.
🔧 Suggested fix
- self._api_token = os.environ.get("CHUTES_API_TOKEN")
+ self._api_token = (
+ os.environ.get("CHUTES_API_KEY")
+ or os.environ.get("CHUTES_API_TOKEN")
+ )
if not self._api_token:
raise LLMError(
- "CHUTES_API_TOKEN environment variable not set. "
+ "CHUTES_API_KEY (or CHUTES_API_TOKEN) environment variable not set. "
"Get your API token at https://chutes.ai",
code="authentication_error"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class ChutesClient: | |
| """LLM Client for Chutes API with Kimi K2.5-TEE. | |
| Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1 | |
| Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled. | |
| Environment variable: CHUTES_API_TOKEN | |
| Kimi K2.5 parameters: | |
| - Thinking mode: temperature=1.0, top_p=0.95 | |
| - Instant mode: temperature=0.6, top_p=0.95 | |
| - Context window: 256K tokens | |
| """ | |
| def __init__( | |
| self, | |
| model: str = CHUTES_DEFAULT_MODEL, | |
| temperature: Optional[float] = None, | |
| max_tokens: int = 16384, | |
| cost_limit: Optional[float] = None, | |
| enable_thinking: bool = True, | |
| # Legacy params (kept for compatibility) | |
| cache_extended_retention: bool = True, | |
| cache_key: Optional[str] = None, | |
| ): | |
| self.model = model | |
| self.max_tokens = max_tokens | |
| self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0")) | |
| self.enable_thinking = enable_thinking | |
| # Set temperature based on thinking mode if not explicitly provided | |
| if temperature is None: | |
| params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS | |
| self.temperature = params["temperature"] | |
| else: | |
| self.temperature = temperature | |
| self._total_cost = 0.0 | |
| self._total_tokens = 0 | |
| self._request_count = 0 | |
| self._input_tokens = 0 | |
| self._output_tokens = 0 | |
| self._cached_tokens = 0 | |
| # Get API token | |
| self._api_token = os.environ.get("CHUTES_API_TOKEN") | |
| if not self._api_token: | |
| raise LLMError( | |
| "CHUTES_API_TOKEN environment variable not set. " | |
| "Get your API token at https://chutes.ai", | |
| code="authentication_error" | |
| ) | |
| # Import and configure OpenAI client for Chutes API | |
| try: | |
| from openai import OpenAI | |
| self._client = OpenAI( | |
| api_key=self._api_token, | |
| base_url=CHUTES_API_BASE, | |
| ) | |
| except ImportError: | |
| raise ImportError("openai not installed. Run: pip install openai") | |
| class ChutesClient: | |
| """LLM Client for Chutes API with Kimi K2.5-TEE. | |
| Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1 | |
| Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled. | |
| Environment variable: CHUTES_API_TOKEN | |
| Kimi K2.5 parameters: | |
| - Thinking mode: temperature=1.0, top_p=0.95 | |
| - Instant mode: temperature=0.6, top_p=0.95 | |
| - Context window: 256K tokens | |
| """ | |
| def __init__( | |
| self, | |
| model: str = CHUTES_DEFAULT_MODEL, | |
| temperature: Optional[float] = None, | |
| max_tokens: int = 16384, | |
| cost_limit: Optional[float] = None, | |
| enable_thinking: bool = True, | |
| # Legacy params (kept for compatibility) | |
| cache_extended_retention: bool = True, | |
| cache_key: Optional[str] = None, | |
| ): | |
| self.model = model | |
| self.max_tokens = max_tokens | |
| self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0")) | |
| self.enable_thinking = enable_thinking | |
| # Set temperature based on thinking mode if not explicitly provided | |
| if temperature is None: | |
| params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS | |
| self.temperature = params["temperature"] | |
| else: | |
| self.temperature = temperature | |
| self._total_cost = 0.0 | |
| self._total_tokens = 0 | |
| self._request_count = 0 | |
| self._input_tokens = 0 | |
| self._output_tokens = 0 | |
| self._cached_tokens = 0 | |
| # Get API token | |
| self._api_token = ( | |
| os.environ.get("CHUTES_API_KEY") | |
| or os.environ.get("CHUTES_API_TOKEN") | |
| ) | |
| if not self._api_token: | |
| raise LLMError( | |
| "CHUTES_API_KEY (or CHUTES_API_TOKEN) environment variable not set. " | |
| "Get your API token at https://chutes.ai", | |
| code="authentication_error" | |
| ) | |
| # Import and configure OpenAI client for Chutes API | |
| try: | |
| from openai import OpenAI | |
| self._client = OpenAI( | |
| api_key=self._api_token, | |
| base_url=CHUTES_API_BASE, | |
| ) | |
| except ImportError: | |
| raise ImportError("openai not installed. Run: pip install openai") |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 124-124: Unused method argument: cache_extended_retention
(ARG002)
[warning] 125-125: Unused method argument: cache_key
(ARG002)
[warning] 149-153: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 163-163: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
[warning] 163-163: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@src/llm/client.py` around lines 102 - 164, The ChutesClient currently only
reads CHUTES_API_TOKEN so users setting CHUTES_API_KEY (as documented) will get
an auth error; update the token retrieval in ChutesClient.__init__ to accept
either environment variable (check CHUTES_API_TOKEN first, then CHUTES_API_KEY
or vice versa) and set self._api_token accordingly, and update the raised
LLMError message to reference both env var names; ensure the later OpenAI client
initialization still uses self._api_token.
Summary
This PR integrates the Chutes API with the Kimi K2.5-TEE model for the agent.
Changes
<think>...</think>parsingTesting
python3 -c "from src.llm.client import ChutesClient; print('OK')"Related
Summary by CodeRabbit
New Features
Improvements